The main objective of the work presented in this paper was to develop a complete system that would accomplish\r\nthe original visions of the MALACH project. Those goals were to employ automatic speech recognition and\r\ninformation retrieval techniques to provide improved access to the large video archive containing recorded\r\ntestimonies of the Holocaust survivors. The system has been so far developed for the Czech part of the archive\r\nonly. It takes advantage of the state-of-the-art speech recognition system tailored to the challenging properties of\r\nthe recordings in the archive (elderly speakers, spontaneous speech and emotionally loaded content) and its close\r\ncoupling with the actual search engine. The design of the algorithm adopting the spoken term detection\r\napproach is focused on the speed of the retrieval. The resulting system is able to search through the 1,000 h of\r\nvideo constituting the Czech portion of the archive and find query word occurrences in the matter of seconds.\r\nThe phonetic search implemented alongside the search based on the lexicon words allows to find even the words\r\noutside the ASR system lexicon such as names, geographic locations or Jewish slang.
Loading....